Near - Data Scheduling for Data Centers with Multiple

نویسنده

  • Yi Lu
چکیده

Data locality is a fundamental issue for data-parallel applications. Considering MapReduce in Hadoop, the map task scheduling part requires an efficient algorithm which takes data locality into consideration; otherwise, the system may become unstable under loads inside the system’s capacity region and jobs may experience longer completion times which are not of interest. The data chunk needed for any map task can be in memory, on a local disk, in a local rack, in the same cluster or even in another data center. Hence, unless there has been much work on improving the speed of data center networks, different levels of service rates still exist for a task depending on where its data chunk is saved and from which server it receives service. Most of the theoretical work on load balancing is for systems with two levels of data locality including the Pandas algorithm by Xie et al. and the JSQ-MW algorithm by Wang et al., where the former is both throughput and heavy-traffic optimal, while the latter is only throughput optimal, but heavy-traffic optimal in only a special traffic load. We show that an extension of the JSQ-MW algorithm for a system with thee levels of data locality is throughput optimal, but not heavy-traffic optimal for all loads, only for a special traffic scenario. Furthermore, we show that the Pandas algorithm is not even throughput optimal for a system with three levels of data locality. We then propose a novel algorithm, Balanced-Pandas, which is both throughput and heavy-traffic optimal. To the best of our knowledge, this is the first theoretical work on load balancing for a system with more than two levels of data locality. This is more challenging than two levels of data locality as a dilemma between performance and throughput emerges.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detailed Scheduling of Tree-like Pipeline Networks with Multiple Refineries

In the oil supply chain, the refined petroleum products are transported by various transportation modes, such as rail, road, vessel and pipeline. The latter provides one of the safest and cheapest ways to connect production areas to local markets. This paper addresses the operational scheduling of a multi-product tree-like pipeline connecting several refineries to multiple distribution centers ...

متن کامل

Trucks Scheduling in a Multi-product Cross Docking System with Multiple Temporary Storages and Multiple Dock Doors

In order to reduce costs and increase efficiency of a supply chain system, cross docking is one of the most important strategies of warehousing for consolidation shipments from different suppliers to different customers. Products are collected from suppliers by inbound trucks and then moved to customers by outbound trucks through cross dock. Scheduling of trucks plays important role in the cros...

متن کامل

A Simulation-Based Optimization Model for Scheduling New Product Development Projects in Research and Development Centers

a simulation-based optimization approach for the purpose of finding a near-optimal answer can be efficient and effective. In the present paper, first, the mathematical model for the project activity scheduling problem has been presented with a job shop approach. Then, using the Arena 14 software, the simulation model has been designed. Consequently, a numerical example has been solved via runni...

متن کامل

Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers

The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. HPC users need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay b...

متن کامل

Truck scheduling problem in a cross-docking system with release time constraint

In a supply chain, cross-docking is one of the most innovative systems for ameliorating the operational performance at distribution centers. Cross-docking is a logistics strategy in which freight is unloaded from inbound trucks and (almost) directly loaded into outbound trucks, with little or no storage in between, thus no inventory remains at the distribution center. In this study, we consider...

متن کامل

Energy-Efficient Scheduling of HPC Applications in Cloud Computing Environments

The use of High Performance Computing (HPC) in commercial and consumer IT applications is becoming popular. They need the ability to gain rapid and scalable access to high-end computing capabilities. Cloud computing promises to deliver such a computing infrastructure using data centers so that HPC users can access applications and data from a Cloud anywhere in the world on demand and pay based ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017